Evaluation of Summarization Schemes for Learning in Streams

نویسندگان

Alec Pawling

Nitesh V. Chawla

Amitabh Chaudhary

چکیده

Traditional discretization techniques for machine learning, from examples with continuous feature spaces, are not efficient when the data is in the form of a stream from an unknown, possibly changing, distribution. We present a time-and-memory-efficient discretization technique based on computing ε-approximate exponential frequency quantiles, and prove bounds on the worst-case error introduced in computing information entropy in data streams compared to an offline algorithm that has no efficiency constraints. We compare the empirical performance of the technique, using it for feature selection, with (streaming adaptations of) two popular methods of discretization, equal width binning and equal frequency binning, under a variety of streaming scenarios for real and artificial datasets. Our experiments show that ε-approximate exponential frequency quantiles are remarkably consistent in their performance, in contrast to the simple and efficient equal width binning that perform quite well when the streams are from stationary distributions, and quite poorly otherwise.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

One-Class Learning and Concept Summarization for Vaguely Labeled Data Streams

In this paper, we formulate a new research problem of concept learning and summarization for one-class data streams. The main objective is to (1) allow users to label instance groups, instead of single instances, as positive samples for learning, and (2) summarize concepts labeled by users over the whole stream. The employment of the batch-labeling raises serious issues for stream-oriented conc...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Composable, Scalable, and Accurate Weight Summarization of Unaggregated Data Sets

Many data sets occur as unaggregated data sets, where multiple data points are associated with each key. In the aggregate view of the data, the weight of a key is the sum of the weights of data points associated with the key. Examples are measurements of IP packet header streams, distributed data streams produced by events registered by sensor networks, and Web page or multimedia requests to co...

متن کامل

Graph-Based Multi-Modality Learning for Topic-Focused Multi-Document Summarization

Graph-based manifold-ranking methods have been successfully applied to topic-focused multi-document summarization. This paper further proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within-document sentence relationships and the cross-document sentence relationships as two separate modalities (graphs)....

متن کامل

Summarization evaluation for text and speech: issues and approaches

This paper surveys current text and speech summarization evaluation approaches. It discusses advantages and disadvantages of these, with the goal of identifying summarization techniques most suitable to speech summarization. Precision/recall schemes, as well as summary accuracy measures which incorporate weightings based on multiple human decisions, are suggested as particularly suitable in eva...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Evaluation of Summarization Schemes for Learning in Streams

نویسندگان

چکیده

منابع مشابه

One-Class Learning and Concept Summarization for Vaguely Labeled Data Streams

A survey on Automatic Text Summarization

Composable, Scalable, and Accurate Weight Summarization of Unaggregated Data Sets

Graph-Based Multi-Modality Learning for Topic-Focused Multi-Document Summarization

Summarization evaluation for text and speech: issues and approaches

عنوان ژورنال:

اشتراک گذاری